Visualisation Group Project Part - 2¶

Data Cleaning¶

Overview:

  1. Deleting columns and records with too many missing values
  2. Filling the missing values with the average of that respective column for each country

Importing Libraries¶

In [1]:
import numpy as np
import pandas as pd

Loading the dataset into the dataframe and verifying that it has properly loaded¶

In [2]:
path = 'global-data-on-sustainable-energy.csv'
df = pd.read_csv(path)
df.head()
Out[2]:
Entity Year Access to electricity (% of population) Access to clean fuels for cooking Renewable-electricity-generating-capacity-per-capita Financial flows to developing countries (US $) Renewable energy share in the total final energy consumption (%) Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh) ... Primary energy consumption per capita (kWh/person) Energy intensity level of primary energy (MJ/$2017 PPP GDP) Value_co2_emissions_kt_by_country Renewables (% equivalent primary energy) gdp_growth gdp_per_capita Density\n(P/Km2) Land Area(Km2) Latitude Longitude
0 Afghanistan 2000 1.613591 6.2 9.22 20000.0 44.99 0.16 0.0 0.31 ... 302.59482 1.64 760.000000 NaN NaN NaN 60 652230.0 33.93911 67.709953
1 Afghanistan 2001 4.074574 7.2 8.86 130000.0 45.60 0.09 0.0 0.50 ... 236.89185 1.74 730.000000 NaN NaN NaN 60 652230.0 33.93911 67.709953
2 Afghanistan 2002 9.409158 8.2 8.47 3950000.0 37.83 0.13 0.0 0.56 ... 210.86215 1.40 1029.999971 NaN NaN 179.426579 60 652230.0 33.93911 67.709953
3 Afghanistan 2003 14.738506 9.5 8.09 25970000.0 36.66 0.31 0.0 0.63 ... 229.96822 1.40 1220.000029 NaN 8.832278 190.683814 60 652230.0 33.93911 67.709953
4 Afghanistan 2004 20.064968 10.9 7.75 NaN 44.24 0.33 0.0 0.56 ... 204.23125 1.20 1029.999971 NaN 1.414118 211.382074 60 652230.0 33.93911 67.709953

5 rows × 21 columns

In [3]:
#we see that there are 176 countries in the data set
df['Entity'].nunique()
Out[3]:
176

Checking the dataset for how many null values each column has¶

In [4]:
df.isna().sum()
Out[4]:
Entity                                                                 0
Year                                                                   0
Access to electricity (% of population)                               10
Access to clean fuels for cooking                                    169
Renewable-electricity-generating-capacity-per-capita                 931
Financial flows to developing countries (US $)                      2089
Renewable energy share in the total final energy consumption (%)     194
Electricity from fossil fuels (TWh)                                   21
Electricity from nuclear (TWh)                                       126
Electricity from renewables (TWh)                                     21
Low-carbon electricity (% electricity)                                42
Primary energy consumption per capita (kWh/person)                     0
Energy intensity level of primary energy (MJ/$2017 PPP GDP)          207
Value_co2_emissions_kt_by_country                                    428
Renewables (% equivalent primary energy)                            2137
gdp_growth                                                           317
gdp_per_capita                                                       282
Density\n(P/Km2)                                                       1
Land Area(Km2)                                                         1
Latitude                                                               1
Longitude                                                              1
dtype: int64

1. Dropping columns and values with too many missing/null values¶

In [5]:
df.drop(['Financial flows to developing countries (US $)', 'Renewables (% equivalent primary energy)', 'Renewable-electricity-generating-capacity-per-capita'], axis=1, inplace = True)
In [6]:
#since geospatial data is important, we want to look at which row has a missing latitude value
df[df['Latitude'].isnull()]
Out[6]:
Entity Year Access to electricity (% of population) Access to clean fuels for cooking Renewable energy share in the total final energy consumption (%) Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh) Low-carbon electricity (% electricity) Primary energy consumption per capita (kWh/person) Energy intensity level of primary energy (MJ/$2017 PPP GDP) Value_co2_emissions_kt_by_country gdp_growth gdp_per_capita Density\n(P/Km2) Land Area(Km2) Latitude Longitude
1218 French Guiana 2000 NaN NaN 23.84 0.43 0.0 0.0 0.0 13692.394 NaN NaN NaN NaN NaN NaN NaN NaN
In [7]:
#we also find that French Guiana also only has one entry in the whole dataset
df[df['Entity'] == 'French Guiana']
Out[7]:
Entity Year Access to electricity (% of population) Access to clean fuels for cooking Renewable energy share in the total final energy consumption (%) Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh) Low-carbon electricity (% electricity) Primary energy consumption per capita (kWh/person) Energy intensity level of primary energy (MJ/$2017 PPP GDP) Value_co2_emissions_kt_by_country gdp_growth gdp_per_capita Density\n(P/Km2) Land Area(Km2) Latitude Longitude
1218 French Guiana 2000 NaN NaN 23.84 0.43 0.0 0.0 0.0 13692.394 NaN NaN NaN NaN NaN NaN NaN NaN
In [8]:
#therefore, we should drop it
df = df.dropna(subset=['Latitude'])
In [9]:
#checking to make sure it's been dropped
df[df['Entity'] == 'French Guiana']
Out[9]:
Entity Year Access to electricity (% of population) Access to clean fuels for cooking Renewable energy share in the total final energy consumption (%) Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh) Low-carbon electricity (% electricity) Primary energy consumption per capita (kWh/person) Energy intensity level of primary energy (MJ/$2017 PPP GDP) Value_co2_emissions_kt_by_country gdp_growth gdp_per_capita Density\n(P/Km2) Land Area(Km2) Latitude Longitude
In [10]:
#now we want to see what other missing values we need to deal with
df.isna().sum()
Out[10]:
Entity                                                                0
Year                                                                  0
Access to electricity (% of population)                               9
Access to clean fuels for cooking                                   168
Renewable energy share in the total final energy consumption (%)    194
Electricity from fossil fuels (TWh)                                  21
Electricity from nuclear (TWh)                                      126
Electricity from renewables (TWh)                                    21
Low-carbon electricity (% electricity)                               42
Primary energy consumption per capita (kWh/person)                    0
Energy intensity level of primary energy (MJ/$2017 PPP GDP)         206
Value_co2_emissions_kt_by_country                                   427
gdp_growth                                                          316
gdp_per_capita                                                      281
Density\n(P/Km2)                                                      0
Land Area(Km2)                                                        0
Latitude                                                              0
Longitude                                                             0
dtype: int64
In [11]:
#Dropping the null values from Access to Electricity attribute
del_missing_rows = ['Access to electricity (% of population)']

df = df.dropna(subset = del_missing_rows)

2. Filling the rest of the missing values by taking the averages of each column grouped by countries¶

In [12]:
mean_fill = ['Access to clean fuels for cooking', 'Renewable energy share in the total final energy consumption (%)', 'Energy intensity level of primary energy (MJ/$2017 PPP GDP)',
                        'Electricity from nuclear (TWh)', 'Value_co2_emissions_kt_by_country', 'gdp_growth', 'gdp_per_capita']
df[mean_fill] = df.groupby('Entity')[mean_fill].transform(lambda x: x.fillna(x.mean()))

df = df.dropna(subset=mean_fill)
In [13]:
#now we can see that all the null values have been dealt with
df.isna().sum()
Out[13]:
Entity                                                              0
Year                                                                0
Access to electricity (% of population)                             0
Access to clean fuels for cooking                                   0
Renewable energy share in the total final energy consumption (%)    0
Electricity from fossil fuels (TWh)                                 0
Electricity from nuclear (TWh)                                      0
Electricity from renewables (TWh)                                   0
Low-carbon electricity (% electricity)                              0
Primary energy consumption per capita (kWh/person)                  0
Energy intensity level of primary energy (MJ/$2017 PPP GDP)         0
Value_co2_emissions_kt_by_country                                   0
gdp_growth                                                          0
gdp_per_capita                                                      0
Density\n(P/Km2)                                                    0
Land Area(Km2)                                                      0
Latitude                                                            0
Longitude                                                           0
dtype: int64
In [14]:
df['Entity'].nunique()
#worth noting that we got rid of 3 countries (did not have enough information)
Out[14]:
148
In [15]:
df.count()
#Updated number of observations
Out[15]:
Entity                                                              3072
Year                                                                3072
Access to electricity (% of population)                             3072
Access to clean fuels for cooking                                   3072
Renewable energy share in the total final energy consumption (%)    3072
Electricity from fossil fuels (TWh)                                 3072
Electricity from nuclear (TWh)                                      3072
Electricity from renewables (TWh)                                   3072
Low-carbon electricity (% electricity)                              3072
Primary energy consumption per capita (kWh/person)                  3072
Energy intensity level of primary energy (MJ/$2017 PPP GDP)         3072
Value_co2_emissions_kt_by_country                                   3072
gdp_growth                                                          3072
gdp_per_capita                                                      3072
Density\n(P/Km2)                                                    3072
Land Area(Km2)                                                      3072
Latitude                                                            3072
Longitude                                                           3072
dtype: int64
In [16]:
df.describe()
Out[16]:
Year Access to electricity (% of population) Access to clean fuels for cooking Renewable energy share in the total final energy consumption (%) Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh) Low-carbon electricity (% electricity) Primary energy consumption per capita (kWh/person) Energy intensity level of primary energy (MJ/$2017 PPP GDP) Value_co2_emissions_kt_by_country gdp_growth gdp_per_capita Land Area(Km2) Latitude Longitude
count 3072.000000 3072.000000 3072.000000 3072.000000 3072.000000 3072.000000 3072.000000 3072.000000 3072.000000 3072.000000 3.072000e+03 3072.000000 3072.000000 3.072000e+03 3072.000000 3072.000000
mean 2010.070964 76.424935 61.616748 36.056487 75.848330 15.023783 26.999844 39.157532 25688.044730 5.464635 1.655141e+05 3.531519 12471.970232 6.503822e+05 17.969060 13.962829
std 6.046076 31.518466 39.537019 30.146501 376.910325 78.012828 113.049668 34.486952 36850.453626 3.604509 8.089061e+05 5.098920 18698.930772 1.689525e+06 24.739081 65.563072
min 2000.000000 1.252269 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 105.110120 1.030000 3.000000e+01 -36.658153 111.927225 2.100000e+01 -40.900557 -175.198242
25% 2005.000000 52.565432 20.075000 9.207500 0.280000 0.000000 0.070000 5.510743 2716.838475 3.297500 1.990000e+03 1.456106 1193.649479 2.889600e+04 1.650801 -9.696645
50% 2010.000000 97.000000 80.750000 29.180000 2.930000 0.000000 1.620000 34.111065 11674.621500 4.445000 9.920000e+03 3.626519 4156.085428 1.319570e+05 15.870032 18.732207
75% 2015.000000 100.000000 100.000000 61.127500 24.977500 0.000000 10.065000 66.626146 30963.203000 6.180000 5.677250e+04 5.934180 14615.705528 4.881000e+05 39.074208 45.038189
max 2020.000000 100.000000 100.000000 96.040000 5184.130000 809.410000 2184.940000 100.000010 262585.700000 32.570000 1.070722e+07 63.379875 123514.196700 9.984670e+06 64.963051 178.065032

Data Additions for Group Project Part - 2¶

New column for continents added¶

Why?

  1. Helps us to group the countries according to regions hence enabling regional analysis and compare trends across regions.
  2. Prevent Overplotting and reduce density
In [17]:
#creating a new continents column
continent_df = pd.read_excel('Continents.xlsx')

#mapping each country to their respective continent
continent_mapping = {}
for continent in continent_df.columns:
    countries = continent_df[continent].dropna().tolist()
    for country in countries:
        continent_mapping[country] = continent

df['Continent'] = df['Entity'].map(continent_mapping)
In [18]:
#Making sure we don't have any null values for continent
df[df['Continent'].isna() == True]
Out[18]:
Entity Year Access to electricity (% of population) Access to clean fuels for cooking Renewable energy share in the total final energy consumption (%) Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh) Low-carbon electricity (% electricity) Primary energy consumption per capita (kWh/person) Energy intensity level of primary energy (MJ/$2017 PPP GDP) Value_co2_emissions_kt_by_country gdp_growth gdp_per_capita Density\n(P/Km2) Land Area(Km2) Latitude Longitude Continent
In [19]:
#Converting density and year into integer type
df['Year'] = df['Year'].astype(int)
df['Density\\n(P/Km2)'] = df['Density\\n(P/Km2)'].apply(lambda x: int(x.replace(",", "")))

Data Visualisations¶

In [20]:
#importing necessary libraries to obtain the visualisation figures
import pandas as pd
import plotly.express as px
from ipywidgets import interact

Research Question 1¶

What is the relationship between access to electricity and GDP per capita across countries or regions?¶

Visualisation 1.1 (Used in the Executive Summary)¶

In [21]:
#Aggregating the attributes grouping by Year.
df_agg = df.groupby('Year').agg({
    'Access to electricity (% of population)': 'mean',
    'gdp_per_capita': 'mean'
}).reset_index()

# Create a line plot
fig = px.line(df_agg,
              x='gdp_per_capita',
              y='Access to electricity (% of population)',
              text='Year',
              title='Access to Electricity vs GDP per Capita by Year')

# Overlay scatter points on top of the line plot
fig.add_scatter(x=df_agg['gdp_per_capita'],
                y=df_agg['Access to electricity (% of population)'],
                mode='markers',
                marker=dict(color='black'),
                showlegend = False,
                text=df_agg['Year'])

# Customization
fig.update_traces(textposition='top center')  # Adjust the position of the text label

fig.update_layout(
    xaxis_title='<b>Average GDP per Capita($)</b>',
    yaxis_title='<b>Average Access to Electricity (% of population)<b>',
    autosize=False,  # Turn off autosizing
    width=800,  # Set width of the plot
    height=600,  # Set height of the plot
    title={'text': '<b>Access to Electricity(%) vs GDP per Capita by Year($)</b>', 'x': 0.5, 'y': 0.95, 'xanchor': 'center', 'yanchor': 'top'},
    plot_bgcolor='white',# Center the title,
    xaxis=dict(gridcolor='lightgray'),  # Add grid lines
    yaxis=dict(gridcolor='lightgray')  # Add grid lines
)

fig.update_traces(line=dict(color='mistyrose'))

# Show the plot
fig.show()

Visualisation 1.2¶

In [22]:
numerical_columns = ['Year', 'Access to electricity (% of population)', 'Access to clean fuels for cooking',
                     'Renewable energy share in the total final energy consumption (%)',
                     'Electricity from fossil fuels (TWh)', 'Electricity from nuclear (TWh)',
                     'Electricity from renewables (TWh)', 'Low-carbon electricity (% electricity)',
                     'Primary energy consumption per capita (kWh/person)',
                     'Energy intensity level of primary energy (MJ/$2017 PPP GDP)',
                     'Value_co2_emissions_kt_by_country', 'gdp_growth', 'gdp_per_capita', 'Land Area(Km2)', 'Latitude', 'Longitude']

avg_data = df.groupby('Entity').agg({
    'Continent': 'first',
    **{col: 'mean' for col in numerical_columns}
}).reset_index()

#Using the scatter function to plot the scatterplot
fig = px.scatter(avg_data, y="Access to electricity (% of population)", x="gdp_per_capita",
                 color="Continent", hover_name="Entity",
                 title="<b>Access to Electricity vs GDP per Capita by Region</b>",
                 log_x=True, #We did the log transformation for the x axis
                 color_discrete_sequence=px.colors.qualitative.Set1)

# Modifying the code to create a dropdown menu
# Calculate the number of unique continents to create visibility arrays
num_continents = len(avg_data['Continent'].unique())
continent_buttons = []

# Add a button for selecting 'All'
continent_buttons.append(
    dict(label='All',
         method='update',
         args=[{'visible': [True] * num_continents},
               {'title': '<b>Access to Electricity vs GDP per Capita by Region</b>'}])
)

# Add a button for each continent
for continent in avg_data['Continent'].unique():
    visibility = [continent == cont for cont in avg_data['Continent'].unique()]
    continent_buttons.append(
        dict(label=continent,
             method='update',
             args=[{'visible': visibility},
                   {'title': f'<b>Access to Electricity vs GDP per Capita: {continent}</b>'}])
    )

#Customising the layout after creating the raw figure
fig.update_layout(
    updatemenus=[dict(active=0,
                      buttons=continent_buttons,
                      x=0.75,
                      xanchor='left',
                      y=1.15,
                      yanchor='top')],
    xaxis_title='<b>GDP Per Capita</b>',
    yaxis_title='<b>Access to electricity (% of population)</b>',
    xaxis=dict(showgrid=True, gridcolor='lightgray'),  # Show x-axis gridlines
    yaxis=dict(showgrid=True, gridcolor='lightgray'),  # Show y-axis gridlines
    plot_bgcolor='white'
)

fig.show()

Research Question 2¶

Are there any notable correlations between carbon emissions and the increase in the renewable energy share in total final energy consumption?​¶

Visualisation 2.1 (Used in the Executive Summary)¶

In [23]:
#Filtering out the outliers found in the explorator analysis for unbiased analysis
avg_data2 = avg_data[~avg_data['Entity'].isin(['United States', 'China', 'India', 'Japan'])]

fig = px.scatter(avg_data2, x="Renewable energy share in the total final energy consumption (%)", y="Value_co2_emissions_kt_by_country", color="Continent",
                 hover_name="Entity", title="CO2 Emissions vs Renewable Energy Share", width=800, height=600)

fig.update_yaxes(title="CO2 Emissions (metric tons per capita)")


#Customising our raw figure
fig.update_layout(
    autosize=False,
    width=1000,
    height=800,
    title={'text': '<b>CO2 Emissions vs Renewable Energy Share</b>', 'x': 0.5, 'y': 0.95, 'xanchor': 'center', 'yanchor': 'top'},  # Center the title
    plot_bgcolor='white',  # Set plot background color to white
    xaxis=dict(showgrid=True, gridcolor='lightgrey'),  # Show x-axis gridlines with light grey color
    yaxis=dict(showgrid=True, gridcolor='lightgrey')   # Show y-axis gridlines with light grey color
)

# Show the plot
fig.show()

Visualisation 2.2¶

In [24]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

#Defining the function for each outlier for outlier study

df_china = df[df['Entity'] == 'China']
df_us = df[df['Entity'] == 'United States']
df_ind = df[df['Entity'] == 'India']
df_jpn = df[df['Entity'] == 'Japan']

# Create subplots
fig = make_subplots(rows=2, cols=2, subplot_titles=("China",
                                                     "US",
                                                     "Japan",
                                                     "India"))

# Define a function to add the traces for a given country to the subplots
def add_country_trace(fig, df_country, row, col):
    # Create a line plot for the country
    fig.add_trace(
        go.Scatter(
            x=df_country['Electricity from renewables (TWh)'],
            y=df_country['Value_co2_emissions_kt_by_country'],
            mode='lines+markers+text',
            line=dict(color='mistyrose'),
            marker=dict(color='black'),
            text=df_country['Year'],
            textposition='top center',
            textfont=dict(size=8)
        ),
        row=row, col=col
    )

# Add traces for each country
add_country_trace(fig, df_china, 1, 1)
add_country_trace(fig, df_us, 1, 2)
add_country_trace(fig, df_jpn, 2, 1)
add_country_trace(fig, df_ind, 2, 2)

# Update layout for each subplot
for i in range(1, 5):
    fig.update_xaxes(title_text='Electricity from Renewables (TWh)', showgrid=True, gridcolor='lightgrey', row=(i-1)//2+1, col=(i-1)%2+1)
    fig.update_yaxes(title_text='CO2 Emissions (kt)', showgrid=True, gridcolor='lightgrey', row=(i-1)//2+1, col=(i-1)%2+1)

# Update overall layout
fig.update_layout(height=1000, width=1200, title_text="CO2 Emissions vs Electricity from Renewables (Outlier Study)",
                 plot_bgcolor='white') 

# Show the plot
fig.show()

Research Question 3¶

How has the electricity generation from fossil fuels changed over time, and is there a relationship with the corresponding growth in renewable energy sources?¶

Visualisation 3.1¶

In [25]:
import numpy as np
import plotly.express as px

# Aggregate data

df_aggregated = df.groupby('Year')['Electricity from fossil fuels (TWh)'].sum().reset_index()

# Calculate confidence intervals (assuming 95% confidence)
confidence_level = 0.95
n = len(df_aggregated)
std_dev = df_aggregated['Electricity from fossil fuels (TWh)'].std()
margin_of_error = 1.96 * std_dev / np.sqrt(n)

# Add upper and lower bounds
df_aggregated['Upper Bound'] = df_aggregated['Electricity from fossil fuels (TWh)'] + margin_of_error
df_aggregated['Lower Bound'] = df_aggregated['Electricity from fossil fuels (TWh)'] - margin_of_error

# Create a line plot with shaded confidence intervals
fig = px.line(df_aggregated, x='Year', y='Electricity from fossil fuels (TWh)',
              title='Global Trend of Electricity from Fossil Fuels', width=800, height=600)
fig.add_scatter(x=df_aggregated['Year'], y=df_aggregated['Upper Bound'], mode='lines', 
                fill='tonexty', fillcolor='rgba(265, 192, 192, 0.2)', name='Confidence Interval',
                line=dict(color='rgba(0,0,0,0)'))  # Set line color to transparent
fig.add_scatter(x=df_aggregated['Year'], y=df_aggregated['Lower Bound'], mode='lines',
                fill='tonexty', fillcolor='rgba(265, 192, 192, 0.4)', showlegend=False,
                line=dict(color='rgba(0,0,0,0)'))  # Set line color to transparent

# Customizing layout
fig.update_layout(title={'text': '<b>Global Trend of Electricity from Fossil Fuels</b>',
                         'x': 0.4, 'y': 0.9,  # Centered horizontally and vertically
                         'xanchor': 'center', 'yanchor': 'top'},
                  yaxis_title='<b>Electricity from fossil fuels (TWh)</b>',
                  xaxis_title='<b>Year</b>',
                  plot_bgcolor='white',
                  xaxis=dict(showgrid=True, gridcolor='lightgray'),  # Add x-axis gridlines
                  yaxis=dict(showgrid=True, gridcolor='lightgray'))  # Add y-axis gridlines

# Show the plot
fig.show()

Visualisation 3.2 (Used in the Executive Summary)¶

In [28]:
#Grouping and sorting the values according to the countries and years (taking marginal change)
df_s = df.sort_values(by=['Entity', 'Year'])
df_s['Electricity from Renewables Growth'] = df_s.groupby('Entity')['Electricity from renewables (TWh)'].diff()
df_s['Electricity from Fossil Fuels Growth'] = df_s.groupby('Entity')['Electricity from fossil fuels (TWh)'].diff()
df_aggregated = df_s.groupby('Year').agg({
    'Electricity from Renewables Growth': 'sum',
    'Electricity from Fossil Fuels Growth': 'sum'
}).reset_index()

#importing necessary libraries
import plotly.graph_objects as go
from plotly.subplots import make_subplots

#Creating the raw figure
fig = make_subplots(rows=1, cols=2, subplot_titles=("Electricity from Renewables Growth <br> (Marginal Change)", "Electricity from Fossil Fuels Growth <br> (Marginal Change)"))

fig.add_trace(go.Scatter(x=df_aggregated['Year'], y=df_aggregated['Electricity from Renewables Growth'],
                         mode='lines+markers',
                         name='Electricity from Renewables Growth',
                         line=dict(color='blue', width=2)),
              row=1, col=1)

fig.add_trace(go.Scatter(x=df_aggregated['Year'], y=df_aggregated['Electricity from Fossil Fuels Growth'],
                         mode='lines+markers',
                         name='Electricity from Fossil Fuels Growth',
                         line=dict(color='red', width=2)),
              row=1, col=2)

#Customising the raw figure
fig.update_layout(title='<b>Renewables Growth vs. Fossil Fuel Growth in Electricity Usage over the Years</b>',
                  yaxis_title='Growth (TWh)',
                  showlegend=False, plot_bgcolor = 'white')

fig.update_xaxes(title_text="Year", row=1, col=1, showgrid = True, gridcolor = 'lightgray')
fig.update_xaxes(title_text="Year", row=1, col=2, showgrid = True, gridcolor = 'lightgray')

fig.show()

Visualisation 3.3¶

In [27]:
#Normalising the data by taking into. account the density of population and land area

df['Estimated Population'] = df['Density\\n(P/Km2)'] * df['Land Area(Km2)']
df['Electricity from fossil fuels (TWh) per capita'] = df['Electricity from fossil fuels (TWh)'] / df['Estimated Population']

# Filtering out major outliers
df_no = df[~df['Entity'].isin(['United States', 'China', 'India', 'Japan'])]


#Plotting the actual figure (choropleth)
fig = px.choropleth(
    df_no,
    locations='Entity',
    color='Electricity from fossil fuels (TWh) per capita',  # Updated to per capita
    title='Global Electricity Usage from Fossil Fuels (TWh) per Capita',  # Updated title
    hover_name='Entity',
    animation_frame='Year',
    projection='natural earth',
    locationmode='country names',
    color_continuous_scale='blues',  # Specify color scale
    range_color=(df_no['Electricity from fossil fuels (TWh) per capita'].min(), df_no['Electricity from fossil fuels (TWh) per capita'].max())  # Set consistent range for color scale
)

# Adjust layout for better aesthetics
fig.update_layout(
    margin=dict(l=0, r=0, t=50, b=0),  # Set margins
    geo=dict(bgcolor='white'),  # Set background color
    annotations=[
        dict(
            text="Excluding major outliers ",
            x=0.01,
            y=0.8,
            showarrow=False,
            font=dict(size=14),
        ), dict(
            text="(US, China, India, Japan)",
            x=0.01,
            y=0.75,
            showarrow=False,
            font=dict(size=14))
    ]
)

fig.update_layout(title={'text': '<b>Global Electricity Usage from Fossil Fuels (TWh) per Capita</b>',
                         'x': 0.359, 'y': 0.95,  # Centered horizontally and vertically
                         'xanchor': 'center', 'yanchor': 'top'})

# Show the map
fig.show()